12 research outputs found

    Polar Codes with exponentially small error at finite block length

    Get PDF
    We show that the entire class of polar codes (up to a natural necessary condition) converge to capacity at block lengths polynomial in the gap to capacity, while simultaneously achieving failure probabilities that are exponentially small in the block length (i.e., decoding fails with probability exp(NΩ(1))\exp(-N^{\Omega(1)}) for codes of length NN). Previously this combination was known only for one specific family within the class of polar codes, whereas we establish this whenever the polar code exhibits a condition necessary for any polarization. Our results adapt and strengthen a local analysis of polar codes due to the authors with Nakkiran and Rudra [Proc. STOC 2018]. Their analysis related the time-local behavior of a martingale to its global convergence, and this allowed them to prove that the broad class of polar codes converge to capacity at polynomial block lengths. Their analysis easily adapts to show exponentially small failure probabilities, provided the associated martingale, the ``Arikan martingale'', exhibits a corresponding strong local effect. The main contribution of this work is a much stronger local analysis of the Arikan martingale. This leads to the general result claimed above. In addition to our general result, we also show, for the first time, polar codes that achieve failure probability exp(Nβ)\exp(-N^{\beta}) for any β<1\beta < 1 while converging to capacity at block length polynomial in the gap to capacity. Finally we also show that the ``local'' approach can be combined with any analysis of failure probability of an arbitrary polar code to get essentially the same failure probability while achieving block length polynomial in the gap to capacity.Comment: 17 pages, Appeared in RANDOM'1

    Induced minors and well-quasi-ordering

    Get PDF
    A graph HH is an induced minor of a graph GG if it can be obtained from an induced subgraph of GG by contracting edges. Otherwise, GG is said to be HH-induced minor-free. Robin Thomas showed that K4K_4-induced minor-free graphs are well-quasi-ordered by induced minors [Graphs without K4K_4 and well-quasi-ordering, Journal of Combinatorial Theory, Series B, 38(3):240 -- 247, 1985]. We provide a dichotomy theorem for HH-induced minor-free graphs and show that the class of HH-induced minor-free graphs is well-quasi-ordered by the induced minor relation if and only if HH is an induced minor of the gem (the path on 4 vertices plus a dominating vertex) or of the graph obtained by adding a vertex of degree 2 to the complete graph on 4 vertices. To this end we proved two decomposition theorems which are of independent interest. Similar dichotomy results were previously given for subgraphs by Guoli Ding in [Subgraphs and well-quasi-ordering, Journal of Graph Theory, 16(5):489--502, 1992] and for induced subgraphs by Peter Damaschke in [Induced subgraphs and well-quasi-ordering, Journal of Graph Theory, 14(4):427--435, 1990]

    General Strong Polarization

    Full text link
    Arikan's exciting discovery of polar codes has provided an altogether new way to efficiently achieve Shannon capacity. Given a (constant-sized) invertible matrix MM, a family of polar codes can be associated with this matrix and its ability to approach capacity follows from the {\em polarization} of an associated [0,1][0,1]-bounded martingale, namely its convergence in the limit to either 00 or 11. Arikan showed polarization of the martingale associated with the matrix G2=(1011)G_2 = \left(\begin{matrix} 1& 0 1& 1\end{matrix}\right) to get capacity achieving codes. His analysis was later extended to all matrices MM that satisfy an obvious necessary condition for polarization. While Arikan's theorem does not guarantee that the codes achieve capacity at small blocklengths, it turns out that a "strong" analysis of the polarization of the underlying martingale would lead to such constructions. Indeed for the martingale associated with G2G_2 such a strong polarization was shown in two independent works ([Guruswami and Xia, IEEE IT '15] and [Hassani et al., IEEE IT '14]), resolving a major theoretical challenge of the efficient attainment of Shannon capacity. In this work we extend the result above to cover martingales associated with all matrices that satisfy the necessary condition for (weak) polarization. In addition to being vastly more general, our proofs of strong polarization are also simpler and modular. Specifically, our result shows strong polarization over all prime fields and leads to efficient capacity-achieving codes for arbitrary symmetric memoryless channels. We show how to use our analyses to achieve exponentially small error probabilities at lengths inverse polynomial in the gap to capacity. Indeed we show that we can essentially match any error probability with lengths that are only inverse polynomial in the gap to capacity.Comment: 73 pages, 2 figures. The final version appeared in JACM. This paper combines results presented in preliminary form at STOC 2018 and RANDOM 201

    Communication Complexity of Inner Product in Symmetric Normed Spaces

    Get PDF
    We introduce and study the communication complexity of computing the inner product of two vectors, where the input is restricted w.r.t. a norm NN on the space Rn\mathbb{R}^n. Here, Alice and Bob hold two vectors v,uv,u such that vN1\|v\|_N\le 1 and uN1\|u\|_{N^*}\le 1, where NN^* is the dual norm. They want to compute their inner product v,u\langle v,u \rangle up to an ε\varepsilon additive term. The problem is denoted by IPN\mathrm{IP}_N. We systematically study IPN\mathrm{IP}_N, showing the following results: - For any symmetric norm NN, given vN1\|v\|_N\le 1 and uN1\|u\|_{N^*}\le 1 there is a randomized protocol for IPN\mathrm{IP}_N using O~(ε6logn)\tilde{\mathcal{O}}(\varepsilon^{-6} \log n) bits -- we will denote this by Rε,1/3(IPN)O~(ε6logn)\mathcal{R}_{\varepsilon,1/3}(\mathrm{IP}_{N}) \leq \tilde{\mathcal{O}}(\varepsilon^{-6} \log n). - One way communication complexity R(IPp)O(εmax(2,p)lognε)\overrightarrow{\mathcal{R}}(\mathrm{IP}_{\ell_p})\leq\mathcal{O}(\varepsilon^{-\max(2,p)}\cdot \log\frac n\varepsilon), and a nearly matching lower bound R(IPp)Ω(εmax(2,p))\overrightarrow{\mathcal{R}}(\mathrm{IP}_{\ell_p}) \geq \Omega(\varepsilon^{-\max(2,p)}) for εmax(2,p)n\varepsilon^{-\max(2,p)} \ll n. - One way communication complexity R(N)\overrightarrow{\mathcal{R}}(N) for a symmetric norm NN is governed by embeddings k\ell_\infty^k into NN. Specifically, while a small distortion embedding easily implies a lower bound Ω(k)\Omega(k), we show that, conversely, non-existence of such an embedding implies protocol with communication kO(loglogk)log2nk^{\mathcal{O}(\log \log k)} \log^2 n. - For arbitrary origin symmetric convex polytope PP, we show R(IPN)O(ε2logxc(P))\mathcal{R}(\mathrm{IP}_{N}) \le\mathcal{O}(\varepsilon^{-2} \log \mathrm{xc}(P)), where NN is the unique norm for which PP is a unit ball, and xc(P)\mathrm{xc}(P) is the extension complexity of PP.Comment: Accepted to ITCS 202

    An Improved Lower Bound for Sparse Reconstruction from Subsampled Hadamard Matrices

    Full text link
    We give a short argument that yields a new lower bound on the number of subsampled rows from a bounded, orthonormal matrix necessary to form a matrix with the restricted isometry property. We show that a matrix formed by uniformly subsampling rows of an N×NN \times N Hadamard matrix contains a KK-sparse vector in the kernel, unless the number of subsampled rows is Ω(KlogKlog(N/K))\Omega(K \log K \log (N/K)) --- our lower bound applies whenever min(K,N/K)>logCN\min(K, N/K) > \log^C N. Containing a sparse vector in the kernel precludes not only the restricted isometry property, but more generally the application of those matrices for uniform sparse recovery.Comment: Improved exposition and added an autho

    When Does Optimizing a Proper Loss Yield Calibration?

    Full text link
    Optimizing proper loss functions is popularly believed to yield predictors with good calibration properties; the intuition being that for such losses, the global optimum is to predict the ground-truth probabilities, which is indeed calibrated. However, typical machine learning models are trained to approximately minimize loss over restricted families of predictors, that are unlikely to contain the ground truth. Under what circumstances does optimizing proper loss over a restricted family yield calibrated models? What precise calibration guarantees does it give? In this work, we provide a rigorous answer to these questions. We replace the global optimality with a local optimality condition stipulating that the (proper) loss of the predictor cannot be reduced much by post-processing its predictions with a certain family of Lipschitz functions. We show that any predictor with this local optimality satisfies smooth calibration as defined in Kakade-Foster (2008), B{\l}asiok et al. (2023). Local optimality is plausibly satisfied by well-trained DNNs, which suggests an explanation for why they are calibrated from proper loss minimization alone. Finally, we show that the connection between local optimality and calibration error goes both ways: nearly calibrated predictors are also nearly locally optimal

    A Unifying Theory of Distance from Calibration

    Full text link
    We study the fundamental question of how to define and measure the distance from calibration for probabilistic predictors. While the notion of perfect calibration is well-understood, there is no consensus on how to quantify the distance from perfect calibration. Numerous calibration measures have been proposed in the literature, but it is unclear how they compare to each other, and many popular measures such as Expected Calibration Error (ECE) fail to satisfy basic properties like continuity. We present a rigorous framework for analyzing calibration measures, inspired by the literature on property testing. We propose a ground-truth notion of distance from calibration: the 1\ell_1 distance to the nearest perfectly calibrated predictor. We define a consistent calibration measure as one that is polynomially related to this distance. Applying our framework, we identify three calibration measures that are consistent and can be estimated efficiently: smooth calibration, interval calibration, and Laplace kernel calibration. The former two give quadratic approximations to the ground truth distance, which we show is information-theoretically optimal in a natural model for measuring calibration which we term the prediction-only access model. Our work thus establishes fundamental lower and upper bounds on measuring the distance to calibration, and also provides theoretical justification for preferring certain metrics (like Laplace kernel calibration) in practice.Comment: In STOC 202

    Induced minors and well-quasi-ordering

    Get PDF
    International audienceA graph H is an induced minor of a graph G if it can be obtained from an induced subgraph of G by contracting edges. Otherwise, G is said to be H-induced minor-free. Robin Thomas showed in [Graphs without K 4 and well-quasi-ordering, Journal of Combinatorial Theory, Series B, 38(3):240 – 247, 1985] that K 4-induced minor-free graphs are well-quasi ordered by induced minors. We provide a dichotomy theorem for H-induced minor-free graphs and show that the class of H-induced minor-free graphs is well-quasi-ordered by the induced minor relation if and only if H is an induced minor of the gem (the path on 4 vertices plus a dominating vertex) or of the graph obtained by adding a vertex of degree 2 to the complete graph on 4 vertices.Similar dichotomy results were previously given by Guoli Ding in [Subgraphs and well-quasi-ordering, Journal of Graph Theory, 16(5):489–502, 1992] for subgraphs and Peter Damaschke in [Induced subgraphs and well-quasi-ordering, Journal of Graph Theory, 14(4):427–435, 1990] for induced subgraphs
    corecore